Benchmarking Declarative Approximate Selection

نویسنده

  • Oktie Hassanzadeh
چکیده

Benchmarking Declarative Approximate Selection Predicates Oktie Hassanzadeh Master of Science Graduate Department of Computer Science University of Toronto 2007 Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data source. A primary advantage of such an approach is the ease of use and integration with existing applications. Over the last couple of years several similarity predicates have been proposed for common quality primitives (approximate selections, joins, etc.) and have been fully expressed using declarative SQL statements. In this thesis, new similarity predicates are proposed along with their declarative realization, based on notions of probabilistic information retrieval. Then, full declarative specifications of previously proposed similarity predicates in the literature are presented, grouped into classes according to their primary characteristics. Finally, a thorough performance and accuracy study comparing a large number of similarity predicates for data cleaning operations is performed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the evaluation of the LarKC Reasoner Plug-ins

In this paper, we present an initial framework of evaluation and benchmarking of reasoners deployed within the LarKC platform, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. We discuss the evaluation methods, measures, benchmarks, and performance targets for the plug-ins to be develo...

متن کامل

The Efficacy of Procedural and Declarative Learning Strategies on EFL Students’ Oral Proficiency

Style and strategies in EFL learning contexts and the effects of task types were explored to enhance language learning strategies. Using a quantitative pre-test, post-test design and interviews, this study investigated the effects of procedural and declarative learning strategies on EFL learners’ acquisition of English past tense performing narrative tasks. The participants were 396 male and fe...

متن کامل

DataSynth: Generating Synthetic Data using Declarative Constraints

A variety of scenarios such as database system and application testing, data masking, and benchmarking require synthetic database instances, often having complex data characteristics. We present DataSynth, a flexible tool for generating synthetic databases. DataSynth uses a simple and powerful declarative abstraction based on cardinality constraints to specify data characteristics, and uses sop...

متن کامل

Declarative generation of synthetic XML data

Synthetic data can be extremely useful in testing and evaluating algorithms, tools and systems. Most synthetic data generators available today are the result of individual benchmarking efforts. Typically, these are complex programs in which the specifications of both the structure and the contents of the data are hard-coded. As a result, it is often difficult to customize these tools for produc...

متن کامل

Decoys Selection in Benchmarking Datasets: Overview and Perspectives

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007